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Abstract 

Using the method of symbolic dynamics, we show that a large class of classical 
chaotic maps exhibit exponential hypersensitivity to perturbation, i.e., a rapid 
increase with time of the information needed to describe the perturbed time 
evolution of the Liouville density, the information attaining values that are 
exponentially larger than the entropy increase that results from averaging over 
the perturbation. The exponential rate of growth of the ratio of information to 
entropy is given by the Kolmogorov-Sinai entropy of the map. These findings 
generalize and extend results obtained for the baker's map [R. Schack and 
C. M. Caves, Phys. Rev. Lett. 69, 3413 (1992)]. 

I. INTRODUCTION 

Chaos in Hamiltonian systems is usually defined in terms of trajectories of phase-space 
points. The Lyapunov exponent describes how initially close trajectories diverge exponen- 
tially [0]. The Kolmogorov-Sinai (KS) entropy measures the rate at which information about 
the initial phase-space point must be supplied in order to predict the coarse-grained behavior 
of a trajectory at a later time 0|§. 

Signatures of chaos are less obvious if attention is shifted from the time evolution of 
phase-space points to the time evolution of probability densities, governed by the Liouville 
equation. If the distance between two densities is defined in terms of an overlap integral, 
there is no exponential divergence of initially close densities since the overlap integral is 
constant in time ("Koopman's theorem" [f|||). Furthermore, as a direct consequence of 
Koopman's theorem, if one is given the Hamiltonian and the initial density to a certain 
accuracy, then no additional information is needed to predict the density at all later times 
t to the same accuracy, except for a negligible amount of information needed to specify the 
time t 0,0. This means that the popular information-theoretic interpretation || of chaos 
via the KS entropy does not apply to Liouville densities. 

In this paper we show that there is an information-theoretic way to characterize chaos for 
Liouville densities in systems with a positive KS entropy. In particular, we show that a large 
class of Hamiltonian systems with positive KS entropy display an exponential hypersensitiv- 
ity to perturbation. We have investigated hypersensitivity to perturbation previously |8|-[T2|, 
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both for classical and quantum systems, and have characterized it is as a rapid increase with 
time of the information needed to describe the perturbed time evolution of the system state 
(Liouville density for classical systems, state vector for quantum systems), the information 
attaining values much larger than the entropy increase that results from averaging over the 
perturbation. 

Here we formulate the concept of hypersensitivity to perturbation more precisely. We 
consider the amount of information needed to keep track of the perturbed time evolution to a 
level of accuracy that keeps the increase of system entropy below a certain "tolerable" level. 
This information should be compared to the entropy reduction it buys, i.e., to the differ- 
ence between the entropy increase that results from averaging over the perturbation and the 
tolerable entropy increase. We characterize hypersensitivity to perturbation in terms of the 
ratio of information to entropy reduction. A system displays hypersensitivity to perturbation 
if the ratio grows rapidly with time, becoming much larger than unity, for almost all values 
of the tolerable entropy; a system displays exponential hypersensitivity to perturbation if the 
ratio grows exponentially. We show that a large class of Hamiltonian systems with posi- 
tive KS entropy display exponential hypersensitivity to perturbation, with the exponential 
growth rate given by the KS entropy. This result establishes a direct connection between 
measures of chaos based on trajectories and our information-theoretic characterization for 
Liouville densities. 

There are at least two important motivations for investigating signatures of chaos in 
Liouville densities. One motivation comes from the tricky question of how to characterize 
quantum chaos. In quantum mechanics, trajectories of state vectors show no sensitivity to 
initial conditions because the Schrodinger equation is linear and preserves the inner prod- 
uct. This argument does not prove, however, that there is no chaos in quantum mechanics 
|T3| , because the Liouville equation, like the Schrodinger equation, is linear and preserves the 
overlap between densities, yet any chaotic classical Hamiltonian system can be described by a 
Liouville equation. Furthermore, the classical analog of a quantum state vector is not a point 
in classical phase space, but a Liouville density PJ^]. In contrast to the above-mentioned 
characterizations of classical chaos in terms of phase-space trajectories, a characterization 
of classical chaos in terms of Liouville densities can be expected to have a straightforward 
generalization to quantum systems |§. We have indeed found that hypersensitivity to per- 
turbation is present in quantum systems |Tl|JT2| . 

The other main motivation for studying chaos in Liouville densities lies in the central 
role Liouville densities play in statistical mechanics. The connection of the present work 
with statistical mechanics is outlined in Sec. |H|. m Sec. |TXT] we give a precise definition of 
hypersensitivity to perturbation. Section [TV] reviews the method of symbolic dynamics. In 
Sec. [V], the heart of the paper, we apply the method of symbolic dynamics to prove that a 
large class of perturbed chaotic systems display exponential hypersensitivity to perturbation. 
In Sec. |VT| we distill the essence of the symbolic-dynamics analysis to develop a simple, 
heuristic picture of hypersensitivity to perturbation, which explains why chaotic systems 
exhibit exponential hypersensitivity to perturbation and regular, or integrable systems do 
not. A reader not interested in the details of the symbolic dynamics might profitably skip 
Sees. [TV] and [V] and proceed directly to Sec. |VI|. 
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II. CONNECTION WITH STATISTICAL MECHANICS 



In statistical mechanics the exact point a system occupies in phase space typically is 
not known. The predictions of classical statistical mechanics are derived from a Liouville 
probability density p(x) on phase space, which describes incomplete knowledge of the sys- 
tem's phase-space point x and which is the mathematical representation of a system state. 
The entropy (in bits) of a system state p(x), also called the Gibbs entropy or fine-grained 
entropy, is defined as 



H = - J dT{x)p{x)\og 2 [p{x)] , (2.1) 



where T(x) is the standard phase-space measure. (The use of base-2 logarithms here and 
throughout this paper means that entropy and information are measured in bits.) Since 



the Gibbs entropy is formally identical to Shannon's |14| statistical measure of information, 
entropy can be interpreted as the amount of information missing toward a complete specifi- 
cation of the system. The classical entropy is defined up to an arbitrary additive constant, 
reflecting the fact that an infinite amount of information is needed to give the exact location 
of a point in phase space. 

As a consequence of Liouville's theorem, the entropy remains constant under Hamiltonian 
time evolution. We adopt here the Bayesian, or information-theoretic, approach to statistical 
mechanics [15-fL7|l, according to which the constancy of the Gibbs entropy is an expression 
of the fact that no information about the initial Liouville density is lost under Hamiltonian 
time evolution. 

The Bayesian approach to statistical mechanics is connected with thermodynamics in 
the following way: Assume there is a heat reservoir at temperature T, with which all energy 
in the form of heat must ultimately be exchanged, possibly by using intermediate steps 
such as storage at some other temperature; then each bit of missing information about the 
system state reduces by the amount ksT \n2 the energy that can be extracted from the 
system in the form of useful work. The Bayesian approach can thus be summarized in two 
statements: (i) entropy is missing information — a mathematical statement; (ii) each bit of 
missing information costs fceTln2 of useful work — this is the physics. 

Since entropy is a measure of missing information, entropy increases if information about 
the system is lost. There are two main mechanisms leading to information loss (as noted 
above, Hamiltonian time evolution is not such a mechanism): deliberate discarding of infor- 
mation and loss of information through interaction with an incompletely known environment. 

Deliberate discarding of information was used by Jaynes |T5|-|rT|j to derive traditional 
thermodynamics. Jaynes showed how equilibrium thermodynamics follows effortlessly from 
the Liouville equation if the only information retained is the values of the macroscopic 
variables defining a thermodynamic state. In Jaynes's approach, irrelevant information is 
discarded by means of the principle of maximum entropy. Another example is the derivation 
of the Boltzmann equation ||18|| ; here information about correlations between particles is 



discarded as irrelevant. 

In contrast to these examples where information is discarded deliberately, an actual loss 
of information can occur in a system that, rather than being perfectly isolated, interacts 
with an incompletely known environment. The interaction with the environment leads to a 
perturbed time evolution of the system. Predictions for the system alone are made by tracing 
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out the environment — i.e., by averaging over the perturbations — which generally leads to an 
entropy increase. This approach was pioneered by Borel [^,|^]. The entropy increase of 
the system due to the interaction with the environment is a result of the environment's 
being in an at least partially unknown state. If suitable information about the environment 
is obtained, the increase in system entropy can be reduced or, if sufficient information is 
obtained, prevented entirely. Averaging over the perturbing environment is usually justified 
by arguing that it is impossible in practice to control the environment. 

In this paper we go beyond the pragmatic argument that controlling the interaction with 
the environment is impossible in practice. We show how the information-theoretic approach 
to statistical mechanics leads naturally to a quantitative measure of how hard it is to keep 
the entropy of the system from increasing by gathering information about the environment. 
The key to quantifying the difficulty of controlling the interaction with the environment is 
Landauer's principle |21~| , p2f , which assigns a thermodynamic cost to information. According 
to Landauer's principle, in the presence of a heat reservoir at temperature T, not only 
does each bit of missing information have a free-energy cost of k^T In 2, but each bit of 
information that is acquired has the same free-energy cost of /ceTln2. This cost, called the 
Landauer erasure cost, is paid when the acquired information is erased. Acquired information 
can be quantified by algorithmic information P,p|,[Z5|-|2"o| ; roughly speaking, the algorithmic 
information in an observational record is the length in bits of the shortest record having the 
same information content. 

The question of how hard it is to reduce the system entropy by controlling the envi- 
ronment can now be given a quantitative form: "How big is the Landauer erasure cost of 
the information about the environment which is needed to reduce the increase of system 
entropy by a certain amount?" In the next section we give a mathematical formulation of 
this question. The later parts of this paper are devoted to showing that the answer can be 
used to characterize chaos. 



III. HYPERSENSITIVITY TO PERTURBATION 

Consider a classical Hamiltonian system initially described by a Liouville density 
p(x, t = 0) on phase space. The initial entropy is 

H = -J dr{x)p{x,t = 0)log 2 [p{x,t = 0)) > (3-1) 

where T(x) is the standard phase-space measure. By solving the Liouville equation, one 
obtains the density p(x, t) at time t. According to Liouville's theorem, the entropy remains 
unchanged — the information about the initial density is preserved under Hamiltonian time 
evolution. 

Now assume that the system is coupled to an incompletely known environment in such a 
way that the interaction can be described as an energy- conserving, typically time-dependent 
perturbation of the system Hamiltonian. The system's interaction with the environment is 
thus described by a stochastic Hamiltonian. We denote the perturbed system state at 
time t by p y (x,t) where y labels the particular realization of the stochastic perturbation 
or perturbation history. The possible perturbation histories y are distributed according to 
a probability measure 7(2/). This description in terms of a stochastic system Hamiltonian 



4 



applies when the system is coupled to conserved quantities of the environment. The values of 
the conserved environment quantities label the perturbation histories y, and the probability 
measure 7(2/) is the probability measure for the conserved environment quantities. 

For each perturbation history y, the entropy of the density p y (x, t) is equal to the initial 
entropy Hq. Averaging over all possible perturbation histories leads to an average density 



p(x, t) = J dj(y) p y (x, t) , (3.2) 

with entropy 

H = - J dT{x) p{x, t) log 2 [p(x, t)} = H + AH S , (3.3) 

where AH$ > is the entropy increase due to averaging over the incompletely known 
environment. That AH$ > follows from the concavity of the entropy: the entropy of an 
average distribution is greater than or equal to the average entropy of the distributions that 
contribute to the average. 

Now assume, in accordance with the discussion of Sec. || about gathering information 
from the environment, that an arbitrary measurement, with discrete possible outcomes la- 
beled by integers b, is performed on the environment. The outcome b has conditional prob- 
ability pt,\ y , given the perturbation history y, and hence has unconditioned probability 

Pb = J d^f{y)p h \ y . (3.4) 

The Liouville density for the system state conditional on outcome b we denote by 

p b (x,t) = — I dy(y) p y (x,t)p b \y = f dy(y\b) p y (x,t) , (3.5) 

Pb J J 

where j(y\b) is the probability measure for the perturbation histories conditional on outcome 
b. It follows immediately that 



6 

We denote by 



^2p b p b (x, t)= d~f(y) p y (x, t) = p(x, t) . (3.6) 



AH b = -J dT{x) p b (x, t) \og 2 [ Pb (x, t)\ -H >0 (3.7) 

the change in system entropy conditional on the measurement outcome b, where the inequal- 
ity follows from applying concavity to Eq. (|3.5|), and by 



AH = J2Pb&H b <AH s (3.8) 

b 

the average conditional entropy change, where the inequality follows from applying concavity 
to Eq. (|3T6|). Finally, we denote by 

a7=-5>&iog 2 p 6 (3.9) 

b 
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the average information needed to specify the measurement outcome b. Actually, Eq. fl3.9| ) 
is only a lower bound to the average algorithmic information needed to specify the mea- 
surement outcome b, but it can be shown to be an extremely tight lower bound f25|| . An 
immediate consequence of the definition of entropy is that 

Al + AH>AH S , (3.10) 

with equality holding if and only if the densities pb(x,t) are disjoint. 

Suppose now that one wants to limit the entropy increase of the system to a certain 
tolerable amount AH to i. Then the minimum amount of information about the perturbing 
environment needed to keep the system entropy from increasing by more than AH to \ can be 
written as 



A/ min =_inf AJ, (3.11) 

AH<AH tol 

where the infimum is taken over all possible measurement schemes for which the average 
conditional entropy increase does not exceed AH to \. In other words, AJ m i n is the information 
about the environment that it takes to lower the entropy increase of the system from AH$ 
(the increase due to averaging over the perturbation) down to AH to \; i.e., AJ min is the 
minimum information about the environment needed to reduce the system entropy by an 
amount AH$ — AH to \- As a consequence of Eq. ( |3.10| ), it is a general theorem — and an 
expression of the second law — that 

A/ min > AH S - AH tol . (3.12) 

In the presence of a heat reservoir at temperature T, the information A/ m i n has an energy 
cost fcsTln2 A/ min on erasure, which should be compared to the gain in extractable work 
due to the observation, /c B T In 2 (AHs — AH to {). 

We are now in a position to define hypersensitivity to perturbation. We say a system 
is hypersensitive to perturbation if, for almost all values of AH to \, the information A/ m i n is 
large compared with the corresponding entropy reduction AH$ — AH to \, i.e., 

A/ >1. (3.13) 



AH. - AH, 



tol 



In terms of energy this definition says that, for a system displaying hypersensitivity to 
perturbation, possible gains in system free energy through observations of the environment 
are negligible compared to the Landauer erasure cost of the observational records. 

Hypersensitivity to perturbation requires that the inequality ( |3.13| ) hold for almost all 
values of AH to \. The inequality ( p. 13| ) tends always to hold for sufficiently small values of 



AH to \. The reason is that for these small values of AH to \, one is gathering enough infor- 
mation from the perturbing environment to track a particular system state whose entropy 
is nearly equal to the initial system entropy H . In other words, one is essentially track- 
ing a particular realization of the perturbation among all possible realizations. Thus, for 
small values of AH to i, the information A/ min is a property of the perturbation, being the 
information to specify a particular realization of the perturbation. The important regime 
for assessing hypersensitivity to perturbation is thus where AH to i is near to AH$, and it is 
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in this regime that one can hope that A/ m i n reveals something about the system dynamics, 
rather than properties of the perturbation. 

In earlier publications fi]T0|-|i"2|| , we have conjectured that chaotic Hamiltonian systems, 
classical or quantum, show hypersensitivity to perturbation. For classical chaotic systems, 
this can be made plausible in the following way. Under chaotic time evolution, the Liouville 
density develops structure on finer and finer scales. This highly structured pattern is not 
itself complex in the algorithmic sense — it is completely specified by the initial density, the 
Hamiltonian, and the elapsed time — but it can be perturbed in an enormous number of 
ways ||. This means that the unperturbed pattern lies very close to a large number of 
highly complex patterns and that the information about the perturbation needed to specify 
the perturbed pattern can be very large. In Sec. [V| we go beyond this heuristic argument 
and give a proof that a large class of classical chaotic Hamiltonian systems exhibit an 
exponential hypersensitivity to perturbation, in which the ratio (|3.13|) of information to 
entropy reduction grows exponentially with time, with the exponential rate of growth given 
by the KS entropy of the chaotic dynamics. We find that for this class of chaotic systems, 
the exponential hypersensitivity to perturbation is to a large extent independent of the exact 
nature of the perturbations and, in particular, of the strength of the perturbations. 

In the following sections we limit our investigation to discrete maps. There are two 
natural ways in which a Hamiltonian flow <f> t : X — > X on the phase space X induces a 
discrete map. For an arbitrary time step r, a map / : X — > X is defined by fx = <p T x for all 
x G X. Since 4>t4>sX = <fit+ s x for all times t and s and all x G X, the map / and the flow <p t 
are closely related by f n x = <p nT x for all x G X and all integer n. Alternatively, a discrete 
map can be defined via a Poincare surface of section P] . The stochastic perturbation of the 
flow induces a stochastic perturbation of the map at each step. 



IV. SYMBOLIC DYNAMICS 

The basic idea underlying the method of symbolic dynamics is to simplify the analysis 
of dynamical systems by representing points in phase space by symbolic sequences. Parts of 
the following discussion closely follow 0. 

A discrete abstract dynamical system (M, /x, /) consists of a measurable space M with a 
normalized measure \i and a measure-preserving automorphism / on M, i.e., fi(M) = 1 and 



n(fA) = fi(A) for all measurable A [26,27]. A measurable partition £ of M is defined as a 



collection £ = {E±, . . . , E m } of measurable sets such that 

m m 

\jE t = M and J>(^) = 1 . (4.1) 



i=l i=l 



Consider an m- letter alphabet C — {1, . . . , m} where each letter corresponds to one of the m 
sets in the partition £ . We denote by uj — ■ • ■ cg>_ 1 cg> cg> 1 cg> 2 ■ • • a bi-infinite sequence of letters 
uj n G C and by E the set of all such symbolic sequences. 
For each x G M we define the set S x C S as follows: 

E^ju^G H r n E^Y (4.2) 
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Equivalently, one can say that 00 G T, x •<=>- f n x G E Un for all n. The set 

Sfi = IJ S ( 4 - 3 ) 

of all symbolic sequences corresponding to at least one point in M is called the set of 
admissible sequences. The partition £ is called a generating partition if for each to G Eg the 
intersection 

00 

fl r*3*. (4.4) 

71= — CO 

consists of only one point, i.e., if each admissible symbolic sequence defines a unique point 
in M. In general, even for generating partitions, E x . may have more than one element, 
which means that a point x G M may be represented by several symbolic sequences to G T, x . 
For a generating partition, the picture one should have is that the set Eg of all admissible 
sequences is the union of disjoint subsets T, x , which may have more than one member. 

Let us further define symbolic words as finite symbolic sequences u m ■ ■ ■ u n2 where n\ < 
n 2 . In analogy with Eq. (|4.4|), we define the set of points corresponding to the symbolic 
word u ni ■ ■ ■ uj n2 by 

E„ ni ...„ n2 = fl r n E„ n . (4.5) 

n=ni 

We denote by E^ 1 '™ 2 ^ the set of all symbolic words uj ni ■ ■ ■ to n2 . The symbolic word u m ■ ■ ■ to n2 
is admissible if J5 W -w n contains at least one point; we denote by Eg ' na ^ the set of ad- 
missible symbolic words u ni ■ ■ ■ u n2 . The A^th refinement S N of the partition £, defined 
by 

£ N = {E wo ... a;jv _ 1 I ^0 • • -Wjv-i admissible} , (4.6) 

is also a measurable partition. If £ is a generating partition, then all refinements of £ are also 
generating partitions. Furthermore, if £ is generating, then the sigma algebra generated by 
all refinements of £ coincides with the sigma algebra of all measurable subsets of M j28|-|30| . 
The measure \i induces a measure on the sigma algebra generated by the set of all symbolic 
words via 

H(u ni ■ ■ ■ u n2 ) = M^-^J • (4.7) 
Let us also define a conditional measure 

fl[U)ni ■ ■ ■ U n2 \U n2+1 U n2+2 ■ • •) = lim — r (4.8) 

whenever the limit on the right-hand side exists, which is the case for K systems (see below). 
The entropy H{£ ) of the refinement £ N is defined by 

H(£ N ) = - Yl t*(E W0 ... WN _ 1 )log 2fJ ,(E u)0 ... WN _ 1 ) . (4.9) 

- C/ ^0"' ijJ A r — 1 
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The metric entropy or Kolmogorov- Sinai (KS) entropy of the map / is defined as 



M/)=supM/|£), (4.10) 

£ 

where the supremum is taken over all measurable partitions £ and where 

h,(f\£) = lim ^p. . (4.11) 

If £ is generating, then h^f) = h^(f\£) [p^ -|30||- Systems with a positive KS entropy are 
called K systems. Despite its name, the KS entropy is quite different from the Gibbs entropy, 
for two reasons: (i) H(£ N ) has nothing directly to do with probabilities on the phase space 
M, but is the Shannon information of the ensemble of sets in £ N , when the probability 
of each set is given by its measure; (ii) h^(f\S) is not an entropy at all, but rather is the 
asymptotic rate of increase of H{£ ). 

A dynamical system is called ergodic if time averages equal ensemble averages, i.e., if 



1 r 

lim — Y (f)(f n x) = / dn<j> for almost all x G M, (4.12) 

iV-»oo N ^ J M 



for any /x-integrable function ||26|| . All K systems are ergodic [26 



The map / induces a particularly simple so-called shift map a : £ — > E on the set of 
symbolic sequences. The shift map is defined as 

(cru) n = u) n +i for all n; (4-13) 

i.e., a shifts the entire symbolic sequence to the left. The shift map can be extended to a 
map cr : £( ni > n2 ) — » 5](ni-i,n 2 -l) ^hat ac ^ s on S y m bolic words u) ni ■ ■ -uj n2 G £( ni>n2 ) via 

[a(u ni ■ ■■U)n 2 )]n = ^n+i for m - 1 < n < n 2 - 1. (4-14) 
The set of admissible sequences is invariant under the shift map, i.e., 

a(£ £ ) = £ £ . (4.15) 
Furthermore, for a generating partition £ , the map it : — > M defined by 

oo 

= n f~ n E. n (4.16) 

n=— oo 

[i.e., 7r(co>) = x •<==>- u; G Eg;] is single- valued and continuous 0. If the sets Ei forming 
the partition £ are not mutually exclusive, then the map it is not one-to-one. The overlap 
between different sets Ei, however, is of measure zero. The relation between / and a can be 
summarized in the following commutation diagram: 

T | | 7T . (4.17) 

M M 
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The action of / on measurable subsets of M is faithfully represented by the action of er on 
measurable sets of symbolic sequences. In the following section, we use this representation 
to study hypersensitivity to perturbation for K systems. 

For the remainder of this section, we assume that / is a K system with KS entropy h 
and that S is a generating partition. Since the set Eg of admissible symbolic sequences is 
invariant under the action of the shift map a according to Eq. ( |4.15| ), Eg is a stationary 
source in the language of information theory pi ]. Moreover, by choosing the function <p in 



Eq. ( |4.12| ), for an arbitrary symbolic word Q = ou ni . . .w n2 , to be the indicator function of 
the set Ecj [see Eq. (|4.5|)] corresponding to u, i.e., 

= { I ; < 4 - is > 

one sees that Eg is an ergodic source since / is ergodic. 

According to the Shannon-McMillan theorem, stationary ergodic sources have the asymp- 
totic equipartition property [RI . This means crudely that for sufficiently large n and arbitrary 



ni, the set £j(j ll ' ni + n ~ 1 ) Q f admissible symbolic words of length n consists of approximately 
2 nh symbolic words, each approximately of measure 2~ nh , whereas each of the remaining 
symbolic words has negligible measure. The choice of ri\ is irrelevant because the source is 
stationary. Formally, a source has the asymptotic equipartition property if and only if for 
any e > there is a positive integer no(e) such that, for n > n (e) and arbitrary ni, the 
set vj^ ni ' ni+n x ) f admissible symbolic words of length n decomposes into two sets II and T 
satisfying 

E^)< £ ( 4 - 19 ) 

wen 

and 

2 -n(h+e) < < 2 -n(h-e) for ^ ~ £ j, ^ ^ 



V. PERTURBED CHAOTIC MAPS 

Let (M, fi, f) be a discrete abstract dynamical system that is derived from a Hamiltonian 
phase-space flow as described at the end of Sec. fTJ. This means, in particular, that the 
measure \i is the standard phase-space measure, in units such that the accessible volume of 
phase space is unity. At the nth step the effect of the unperturbed system dynamics is to 
change the phase-space density from the density p(x, n — 1) that emerges from the (n — l)th 
step to a new density 

p'(x, n) = p(f~ x x, n - 1) . (5.1) 

We model a measure-preserving stochastic perturbation by alternating unperturbed time 
steps with application of measure-preserving perturbation maps. More precisely, we do the 
following. We have available a collection of measure-preserving perturbation maps. At the 
nth step we select randomly a particular perturbation map £ : M — -> M from this collection 
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and apply it to the density p'(x,n) that is produced by the unperturbed time step. This 
yields a new density 

p(x, n) = p'iC'x, n) = p{f-\r' L x) i n - l) , (5.2) 

which depends on the map £ and which is the input to the next step. 

We characterize the perturbation maps in terms of two quantities: (i) the "strength" of 
the perturbation, which is roughly the size of the phase-space displacements produced by 
the maps, and (ii) the "correlation cells," which are roughly the phase-space regions over 
which the displacements produced by the maps remain correlated. We pause here to give a 
more precise general definition of perturbation strength, because it highlights an essential 
feature of chaotic dynamics. We defer defining the concept of correlation cells precisely till 
it emerges naturally in the context of the symbolic dynamics of perturbed chaotic maps. 
We return to both concepts in Sec. [VT|, where they are used to develop a heuristic picture 
of hypersensitivity to perturbation. 

To characterize the "strength" of a perturbation, we let S(xi,X2) denote the Euclidean 
distance between the two points X\,x 2 G M relative to some fixed set of canonical 
coordinates. An e-perturbation map is a perturbation map £ for which 5(£x,x) < e for 
all x G M. An e-perturbation map describes a perturbation whose strength is smaller than 
the scale set by e. 

Now suppose that the initial density p(x, n = 0) is well behaved in the sense that there 
is a scale on which p(x,n = 0) varies little; i.e., there is an e > such that p(x 1 ,n = 0) ~ 
p(x 2 ,n = 0) for any pair of points x±,x 2 G M with S(xi,x 2 ) < e . Then, for any integer 
n > 0, there is an e > such that p(x,n) varies little on the scale of e. We say that the 
system is effectively shielded against perturbations at the nth step if there is an e > such 
that the perturbation is described by e-perturbation maps and the density p(x, n) varies 
little on the scale of e. 

One of the defining properties of chaotic dynamics is that the scale e on which the density 
varies little decreases exponentially with the number of time steps n. This entails that chaotic 
systems cannot be effectively shielded against perturbations, except for a small number 
of time steps. We use this fact below as the starting point for developing an essentially 
universal description of perturbed chaotic dynamics. Regular, or integrable systems have no 
exponential relationship between e and n and thus cannot be fitted within the analysis of 
this section. We thus defer discussion of regular systems until we have developed a heuristic 



picture of hypersensitivity to perturbation in Sec. [VJ. 

We now proceed to show that all K systems for which there is a generating partition 
exhibit hypersensitivity to perturbation. This includes all K systems that have a Markov 
partition ||. Assume that the discrete abstract dynamical system (M,p,f) has a finite 
KS entropy h = h^f) > 0, and let S = {Ei, . . . E m } be a generating partition of M. As 
explained in Sec. [TV], / can be represented by a shift map a on the set of admissible symbolic 
sequences Eg, each admissible symbolic sequence corresponding to a single point in M. In 
the following, we identify symbolic sequences with the corresponding points and symbolic 
words with the corresponding subsets of M, writing, e.g., "the symbolic word u ni ■ ■ -uj n2 " 
when we really mean the set of points corresponding to the symbolic word ui ni ■ ■ ■ u n2 . The 
set of admissible symbolic sequences has the asymptotic equipartition property; i.e., for 
n ^> 1, M is partitioned by the admissible symbolic words uo\ ■ ■ ■ uj n in such a way that there 
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are approximately 2 symbolic words each approximately of measure 2~ , whereas each of 
the remaining symbolic words has negligible measure. 

Let us first look at the unperturbed evolution of a simple initial state on M. We assume 
that the initial density p(x, n = 0) is constant on the set of points corresponding to the 
symbolic word 

Q = UJn +l ■ ■ ■ Uno+q , Q > 1, (5.3) 

and zero elsewhere. Here uJ m +i • • • ^m+g is one of the symbolic words that has measure 

Ho = M^n +i • • • u m+q ) ~ 2~ qh . (5.4) 

In the following, we refer to a subset of M on which the density is constant as a pattern. 

We choose the (arbitrary) zero of the entropy such that the entropy of a uniform density 
constant on the entire set M vanishes. This is a natural choice because it corresponds to 
choosing units such that /i(x) is the measure in the Gibbs entropy (|2.1| ). The entropy of the 
initial density is thus 

H = log 2 /i ~ log 2 (2-« A ) = -qh . (5.5) 

The condition 9 > 1 means that the initial entropy H is much smaller than the negative 
of the KS entropy of the map, —h. 

Applying the shift map a for n steps leads to a uniform density on Q' = a n Q = 
uj' no+1 _ n ■ ■ ■ uj' nQ+q _ n where u' k = Uk+ n - The entropy of the shifted pattern remains unchanged. 
As was stressed in Sec. |TJ, the entropy does not change under unperturbed Hamiltonian 
evolution. Moreover, the method of symbolic dynamics makes it utterly obvious that no 
additional information beyond the initial pattern and the number of steps n is needed to give 
a complete description of the evolved pattern. As was pointed out in Sec. |TTT| , the evolved 
unperturbed density, though highly structured when viewed in phase space, is not complex 
in the algorithmic sense. 

We now turn to perturbed evolution. At each step, instead of applying just the map 
/, we now apply first / and then a measure-preserving map £ selected randomly from our 
collection of maps. We make two major assumptions about the perturbation maps £ and 
their probabilities, the first assumption having to do with the perturbation strength and the 
second with the perturbation correlation cells. The first assumption is that below some scale 
on phase space, a single application of the perturbation randomizes the pattern completely. 
In symbolic language this scale is characterized by some negative integer — n p , and our 
assumption can be written as 

Prob((£w) fe = u' k , k = n,..., -n p ) = n(cj' n ■ ■ ■ u/_,J o;_„ p+ io;_„ p+ 2 • • •) (5.6) 

for all n < —n p , w G S^, and U)' n ■ ■ ■ uj'_ np G S^™' n ^ , 

where Prob stands for probability with respect to the random selection of the perturbation 
map. The integer n p is a measure of the strength of the perturbation, large n p meaning a 
weak perturbation. Another way of describing this assumption is the following: take a point 
on phase space — i.e., a symbolic sequence uj — and perturb it to get a new point Eq. (|5.6|) 
means that it is unpredictable, relative to the random selection of perturbation map £, in 
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which partition element Ei the n p th backward iterate and all further backward iterates of 
the perturbed point £c<j fall. 

Our second major assumption concerns the perturbation correlation cells. We assume 
that the perturbation maps £ are not completely arbitrary, but that a particular map dis- 
places neighboring points in a similar way and that, averaged over the random selection of 
perturbation maps, the displacements become uncorrelated for points sufficiently far away 
from each other. We model this behavior by assuming that the space M is partitioned into 
perturbation cells cu_ rip+s+ i ■ ■ -u;_ n +s+r , where r ^> 1 and s > are integers, such that, 
first, the perturbations are uncorrelated for points in different perturbation cells and, sec- 
ond, knowing how a typical point in a perturbation cell is perturbed determines how all 
points in that cell are perturbed. These perturbation cells are a precise realization of the 
notion of correlation cells. 

In addition to our two major assumptions, we make several simplifying assumptions or 
approximations about the perturbation maps. These simplifying assumptions always tend 
to reduce the information AJ min required to reduce the entropy increase to the tolerable 
amount AH to \. Since we want to prove that A/ min is large, such simplifying assumptions 
do not limit the validity of our results. As our first simplifying assumption, we ignore all 
features of the perturbation maps beyond what is needed to satisfy Eq. (|5.6|); i-e., we choose 
perturbation maps £ that satisfy 

(£o>) n = u) n for all n > —n p and uj G Ef, (5.7) 

in addition to Eq. ( p.6| ). This assumption means that the perturbation maps have no effect 
at all on scales larger than the scale set by n p . Allowing the perturbation maps to act on 
scales larger than that set by n p would lead to more distinguishable perturbed patterns — and 
thus to higher A/ m i n — which would have to be tracked to keep the entropy increase to some 
tolerable amount. 

Since it is impossible to shield a chaotic system against perturbations in the sense defined 
above, we are justified in choosing the zero of time (n = 0) such that the perturbation 
becomes effective at the first time step (n = 1). This amounts to choosing the initial symbolic 
word ( |5.3|) so that uq = —n p , where n p is the integer that characterizes the strength of the 
perturbation. This initial symbolic word, which defines the pattern on which the initial 
density p(x, n = 0) is nonzero, can thus be written as 

^ = | ^-rtp+l ' " ' ^-rip+s | ^-rip+s+l " ' ' tO-rip+s+r \ ' ' ' ^-np+q ■ V>-°) 

Since the perturbation maps satisfy Eq. ( |5.7| ), the perturbation leaves the pattern of Eq. ( |5.8| ) 
unchanged. After one time step, however, the leftmost symbol moves into the perturbation 
region, located to the left of the leftmost vertical bar in Eq. (|5.8|) , where it is randomized by 
the perturbation according to Eq. fl5.6|) . The perturbation region is separated by s letters 
from the decision region, located between the middle and rightmost vertical bars in Eq. fl5.8| ) . 
This decision region, r letters wide, defines the perturbation cells. Since we assume that 
r ^> 1, there are approximately 2 rh typical perturbation cells, each of size ~ 2~ rh , whereas the 
total size of the remaining perturbation cells can be neglected. Even though the assumption 
q > r + s is implicit in the way we write the initial word in Eq. ( |5.8j ), this assumption is not 
necessary for our analysis. 
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Focus attention now on the phase-space density p(x, n) after n time steps, where we 
assume that 



q — s > n ^> maxfl, q — s — rj . (5.9) 



These assumptions assure us that the leftmost letter of the initial word ( |5.8| ) has moved deep 
into the perturbation region and the rightmost letter has moved far to the left of the right 
boundary of the decision region, but not more than one position beyond the left boundary 
of the decision region. After n unperturbed steps the initial pattern Q given by Eq. ( |5.8| ) 
evolves into the pattern Q' = a n Q, which has the form 

^ = U-np-n+l ' ' ' ^'-rip I U -np+l ' ' ' ^'-rip+s I ^'-Up+s+l ' ' ' ^'-rip-n+q > (5.10) 

where u' k = u k+n . 

Consider now what happens when the pattern of Eq. (|5.10|) is perturbed. According to 
Eq. (|5.6|) , all n letters in the perturbation region [to the left of the leftmost vertical bar 
in Eq. ( |5.10| )1 are randomized by the perturbation. We can therefore ignore the effect of 
perturbations applied at previous steps. The density that arises from averaging over the 
perturbation is made up of all the patterns that come from randomizing the letters in the 
perturbation region. As a consequence of the asymptotic equipartition property and assump- 
tion ( |5.Gp , there are approximately 2 nh such patterns, all of which have approximately the 
same probability and all of which have approximately the same measure as the unperturbed 
pattern (|5.10|) . Thus averaging over the perturbation leads to an entropy increase 

AH S ~ \og 2 (2 nh ) = nh . (5.11) 

We now turn to estimating the minimum information A/ m i n about the perturbation 
needed to limit the entropy increase to a tolerable value AH to i- Consider again the 
word (|5.10p that describes the unperturbed pattern after n steps. Due to the asymptotic 



equipartition property, the n — (q — s — r) ^> 1 unspecified letters at the right side of the 
decision region correspond to the pattern's extending over 

TZ n = 2 [n - {q - s ~ r)]h > 1 (5.12) 

typical perturbation cells. This exponential increase in the number of typical perturbation 
cells occupied by the pattern continues only until all the typical perturbation cells are 
occupied, i.e., until lZ n = 2 rh or n = q — s. The occupied perturbation cells partition the 
unperturbed pattern into 7Z n sub-patterns of the form 



^-Hp-n+l ' ' ' U '-n p I ^'-np+l ' ' ' ^'-rip+s I ^'-Up+s+l ' ' ' ^'-rip-n+q^ '-n p -n+q+l ' ' " ^-n p +s+r 



(5.13) 

where the n — (q — s — r) letters cDj determine an occupied perturbation cell. 

These sub-patterns, all of approximately the same size, are perturbed independently. We 
describe the perturbed sub-pattern in each perturbation cell by a symbolic word 

ti-n p -n+l ■ ■ ■ ti-n p I ^-n p +l ' ' ' ^'-rip+s I ^-n p +s+l ' ' ' ^'-rip-n+q 1 ^ -n p -n+q+l ' ' ' &-n p +s+r \ , 

(5.14) 
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where the letters Ui are chosen at random according to Eq. ( |5.6|) . Again invoking the 
asymptotic equipartition property, we can say that in each of the lZ n occupied perturbation 
cells, there are 



D = 2 nh > 1 (5.15) 



typical perturbed words, or typical perturbed sub-patterns, of the form ( |5.14| ), all having 



approximately the same probability and all having approximately the same measure as the 
unperturbed sub-pattern ( |5.13| ). 

These considerations give a total of D nn typical perturbed patterns, all produced with 
approximately the same probability by the perturbation and all having approximately the 
same entropy as the unperturbed pattern ( |5.10|) . The information needed to specify a 
particular perturbed pattern — and thus the information needed to keep the tolerable entropy 
increase essentially to zero — is given by 

A/ min ~ U n log 2 D ~ 2^- s -^ h AH s for AH tol ~ 0. (5.16) 

It should be emphasized that the exponential increase of this A/ mm continues only until all 
the typical perturbation cells are occupied, i.e., until n = q — s; for n > q — s the information 
continues to increase, but the form of the increase is more difficult to determine. 

What is going on here has a simple interpretation. Within each perturbation cell, the 
perturbed sub-patterns have essentially no overlap. The overall perturbed patterns, however, 
can have considerable overlap, since two perturbed patterns are different even if they differ in 
only a single perturbation cell. The entropy increase AH$ ~ nh that comes from averaging 
over the perturbation [Eq. ( j5.ll )] is the logarithm of the number D of non- overlapping 



patterns that are required to make up the average density. The number of non-overlapping 
patterns is the same as the number of perturbed sub-patterns in each perturbation cell, and 
hence AH$ ~ nh is also the information required to specify a particular sub-pattern within 
a perturbation cell. To specify a particular overall pattern, however, one must say which 
perturbed sub-pattern is realized in each of the TZ n occupied perturbation cells; this requires 
giving AH$ ~ nh bits per occupied perturbation cell, for a total amount of information 
A/ min ~ 1Z n AHs [Eq. ( |5.16| )1- The information AJ min is much bigger than the average 
entropy increase AH$ because the information counts overlapping patterns, whereas the 
entropy does not. 

Now suppose that one allows a nonzero tolerable entropy increase AH to \. This means 
that one does not have to specify exactly which of the D Un perturbed patterns is realized. 
Instead, one can group the typical perturbed patterns and specify only to which group the 
perturbed pattern belongs. Suppose the typical patterns are grouped into N groups, which 
are labeled by an integer b = 1, . . . , N. In analogy to Sec. [TT1L we denote by Nb the number 
of patterns in the 6th group (Y,b=i Nb — D Un ), by Pb(x) the probability density one obtains 
by averaging over all the patterns in the 6th group, by AHb the corresponding conditional 
entropy increase, and by AH = Y,PbAHb the average conditional entropy increase. Since all 
the patterns are approximately equi-probable, the probability of obtaining the measurement 
record 6, which specifies that the perturbed pattern is in the 6th group, is pb = NbD^ 11 ". 

To obtain A/min for a given AH to i, one would have to find a grouping of the patterns that 
is optimal in the sense of minimizing AJ min under the condition that AH < AH to \. Since 
we do not know how to find an optimal grouping, we construct a nearly optimal grouping as 
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follows. We start with a particular pattern, or fiducial pattern, and form our first group out 
of all the patterns that differ in at most d perturbation cells from the fiducial pattern. Such 
a group we call a ci-group. The grouping into ci-groups is motivated by the fact that the 
entropy increase AH is minimal for groups of patterns that differ in the smallest number of 
perturbation cells [see Eq. ( |5.27| ) below]. There being 

9 k =( n AD-l) k (5.17) 



patterns that differ in exactly k cells from an arbitrary fiducial pattern, the number of 
patterns differing in at most d cells from an arbitrary fiducial pattern and therefore the size 
of a ci-group is 



G d = Y / gk = Y / [ 7 1 ( D ~ l f • (5-18) 




A particularly simple way to proceed would be to pick a second fiducial pattern from 
among the patterns not in the first group, forming a second ci-group about this second pat- 
tern, and then to continue to form (i-groups until all patterns were grouped. Unfortunately, 
this strategy fails because if we proceed in this way, some groups overlap. The problem of 
finding a grouping into non-overlapping ci-groups is equivalent to the problem of finding a 
perfect error-correcting code in information theory |3l[ and generally has no solution. In 



the following, we nevertheless assume that the D n ™ patterns are perfectly grouped into a 
number iV = D^/Gd of ci-groups. We can make this simplifying assumption because it 
lowers our estimate of A/ min . 

We now turn to the computation of the entropy increase AH^ for a ci-group, i.e., a 
group consisting of a fiducial pattern and all the patterns differing in at most d perturbation 
cells from the fiducial pattern. The average density pd{x) for a ci-group is the average of the 
densities for the Gd patterns in the group, all patterns contributing with the same probability 
1/Gd- Alternatively, we can break each contributing pattern into its lZ n sub-patterns — i.e., 
symbolic words of the form ( |5.14| ) — and view Pd{x) as being made up of contributions from 



the DlZ n sub-patterns, all of which have approximately the same measure po/1Z n . 

We distinguish two types of sub-patterns, namely the lZ n sub-patterns belonging to the 
fiducial pattern and the other (D — l)lZ n sub-patterns. The average density pd(x) is uniform 
on each sub-pattern. We denote its value on sub-patterns belonging to the fiducial sub- 
pattern by pdf and its value on the other sub-patterns by pdo- For a sub-pattern belonging 
to the fiducial pattern, the probability obtained by integrating pd over the sub-pattern is 

dp{x) p d = pdf^- = |r . ( 5 - 19 ) 

l\, n /v n 

where pf is the probability obtained by integrating pd over the entire fiducial pattern. Sim- 
ilarly, for any of the other sub-patterns, the probability obtained by integrating pd over the 
sub-pattern is 

J Mx)pd = Pd °K n = (D-l)K n > (5 ' 20) 
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where p a = 1 — pf is the probability obtained by integrating p d over all the sub-patterns 
outside the fiducial pattern. 

The entropy increase of a <i-group can now be written as 



AH d = - J dfi(x) p d (x) \og 2 [p d (x)] - H 



= -p f \og 2 p f -p \og 2 p + p \og 2 (D - 1) . (5.21) 

To evaluate AHd, we must find the integrated probabilities p Q and pf. Each pattern that 
differs in exactly k cells from the fiducial pattern contributes the amount k/lZ n Gd to p Q and 
the amount (1Z n — k)/1Z n Gd to pf. It follows that 

P^t^9^^i{ n A(D-lfk (5.22) 



k=0 T^-riGd Tt n Gd k=0 



and 



p/ = E = tTt" S P^l " ^ (*» - fc ) • ( 5 - 23 ) 

Notice that p Q = TZ^dlnGd/ dhi(D — 1) and that when d = 1Z n , we have Gn n = D n ™, 
p = 1 - l/D, p f = 1/D, and thus AH Un = log 2 D = AH S . 

Under the assumption of perfect grouping into N = D nn /G d <i-groups, the average 
entropy of the <i-groups is AH = AH d , and the information to specify a particular <i-group 
is 

AI d ^ log 2 N ~ TZ n log 2 D - log 2 G d = TZ n AH s - log 2 G d . (5.24) 

Under our further simplifying assumption that optimal grouping is well approximated by 
perfect grouping into <i-groups, we can approximate the minimum information AJ min required 
to keep the entropy increase to a tolerable amount AH to i by 

A/ min ~ AI d for AH tol ~ AH d • (5.25) 

At this point we could plot A/ min as a function of Aif tol by using the common dependence 
on d. Given the assumptions (|5.12|) and Q5.15] ) that lZ n and D are large, however, we can 
introduce further approximations that allow us to write an explicit expression for A/ min 
as a function of Aif to i, valid over nearly the entire range of AH to \. The key to these 
approximations is that g k increases exponentially for k -C k c = (TZ n + 1)(1 — 1/D). This 
means that each of the sums for Gd, p , and pf can be approximated by its largest term 
{k = d), provided lZ n — d ^> 1Z n — k c ~ 7Z n /D — 1. The resulting approximations are 

In this approximation the entropy increase of a d-group is 
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Tt n -d TZ n -d d d d , « . _, 

lo S2 log 2 + -=r \og 2 {D - 1) . (5.27) 



Using the same approximation and applying Stirling's formula, one finds that 

log 2 G d ~log 2 ^ +dlog 2 (D-l) 

/ TZ n -d ll n -d d d d \ 

- Un V~R~ log2 ~ ?T log2 n + n log2p " 1} 

^ ft n A# d . (5.28) 

Combining Eqs. ( ggg ), (|0§), and ( ggg ) yields 

A/ min ~ AI d ~ 1Z n (AH s - AH d ) ~ ft n (AF 5 - AF tol ) . (5.29) 

This expression, the key result of this paper, shows that to reduce the entropy of a perturbed 
chaotic map by an amount AHg — AH to i, one must acquire an amount of information A/ m i n 
about the perturbation which is much larger than the contemplated entropy reduction. In- 
deed, the ratio of information to entropy reduction grows exponentially as lZ n = 2^ l ~ ( - q ~ s ^ r ^ h 
with the number of time steps, the exponential rate of growth being determined by the KS 
entropy h of the map. This is what we mean by exponential hypersensitivity to perturbation. 

We should investigate the validity of the approximations that lead to our key result fl5.29| ) . 
This result agrees with what we have already derived in Eq. (|5.16|) for AH to \ ~ 0. Thus we 
are mainly interested in knowing where the approximations fail as AH to \ approaches AH$. A 
more careful analysis, which keeps track of the errors introduced by the approximation (|5.26| ) 
and by the use of Stirling's formula in Eq. ( |5.28| ), indicates that we must consider separately 
two cases: (i) lZ n ^ D (r + s ^ q), i.e., there are more occupied perturbation cells than there 
are perturbed sub-patterns per cell; (ii) lZ n ^ D (r + s < g), i.e., there are fewer occupied 
perturbation cells than there are perturbed sub-patterns per cell. In case (i), Eq. ( |5.29| ) is 
valid as long as 1Z n — d ^> lZ n /D > 1, which translates to 

TZ n > D: AH S - AH tol > 1 <{=> A/ min > ^ > 1 ; (5.30) 

in case (ii), Eq. (|5.29| ) is valid as long as lZ n — d ^> \n(eD /TZ n ) ^ 1 ^ 7Z n /D, which translates 
to 



U n <D: A^-Atf tol »^-(ln(0 




A/ mill > ( ln( —J ) > 1 . (5.31) 



These restrictions arise because of approximations made in evaluating Aid and AHd- 

There is a separate question of whether perfect d-grouping is a good approximation to 
optimal grouping. The restrictions contained in Eqs. (|5.30|) and (|5.31| ) are probably not the 



most important restrictions on the validity of our key result, because the very idea of perfect 
(i-grouping as an approximation to optimal grouping is suspect when A/ min is as small as a 
few bits. Our hesitancy in defining exponential hypersensitivity to perturbation, where we 
require the information-to-entropy ratio ( |3.13j ) to grow exponentially for "almost all" values 
of AH toh can be traced to this inability to approximate the optimal grouping when AH to \ is 
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very close to AH$- We are left uncertain about the precise behavior of AJ min when AH to \ 
is very close to AH S . 

We can interpret our key result by hearkening back to the interpretation given to 
Eq. (|5.16| ). We first need to describe what it means to to specify the phase-space den- 



sity at a level of resolution defined by a tolerable entropy increase AH to \. To do so, imagine 
that the sub-patterns within each occupied perturbation cell are aggregated into groups, 
which we call coarse-grained sub-patterns, each group consisting of 2 AHto1 sub-patterns so 
that there are 

V = D/2 AHt ° l ~ 2 AHs ~ AHt ° l (5.32) 

coarse-grained sub-patterns in each occupied perturbation cell. A coarse-grained pattern 
consists of coarse-grained sub-patterns, one for each of the lZ n occupied perturbation cells. 
Since a coarse-grained pattern has a measure that is approximately 2 AHto1 times as big as a 
pattern, a coarse-grained pattern represents an entropy increase 

log 2 (2 AH ->) = AF to i . (5.33) 

Thus, specifying the system state at a level of resolution set by AH to \ amounts to specifying 
a particular coarse-grained pattern. 

The further entropy increase that results from averaging over the coarse-grained patterns 
is given approximately by 

log 2 D~ AH S - AH tol . (5.34) 

This entropy increase is the logarithm of the number of non-overlapping coarse-grained 
patterns that are required to make up the density that comes from averaging over the 
perturbation. This number of non-overlapping coarse-grained patterns is the same as the 
number of coarse-grained sub-patterns in each perturbation cell, and hence the entropy 
increase ( |5.34| ) is also the information required to specify a particular coarse-grained sub- 
pattern within a perturbation cell. There being lZ n perturbation cells, the information 
needed to specify an entire coarse-grained pattern becomes 

A/ min ~ K n (AH s - AHtoi) = K n log 2 V , (5.35) 

an amount of information that corresponds to a total of T> nn coarse-grained patterns, all 
produced with approximately the same probability by the perturbation. 

The exponential hypersensitivity to perturbation that we have demonstrated here for 
maps with positive KS entropy is an asymptotic property for large times. By spelling out 
precisely the character of the n — > oo limit, we can see how exponential hypersensitivity to 
perturbation provides an alternative definition of the KS entropy. In discussing the limit, it 
is helpful to have in mind the form (|5.8|) of the initial symbolic word and the form ( |5.10 ) of 



the unperturbed symbolic word after n time steps. The assumptions ( |5.9| ) indicate that as n 
goes to infinity, we should let n — (q — r — s) go to infinity in the same way as n — this allows 
the limit to explore the long-time exponential growth of TZ n — while keeping q — s — n > 
constant — this prevents the exponential growth of TZ n from being halted at the time when 
there is more than one sub-pattern per perturbation cell. Thus an appropriate limit is to let 
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n, q, and r go to infinity, while keeping s, q — n > s, and r — n constant. In thinking about 
how this limit is mapped onto phase space, it is convenient also to let n p go to infinity while 
keeping — n p + q constant; this keeps the rightmost letter of the initial symbolic word in the 
same place as we take the limit. With this understanding of the limit, we can write 

J^fz^f ,o A '-I„ )) = iij^ = * ■ (5.36) 



rwoo y n AH S - AH tol , 

In terms of phase space, this long-time limit means that the size of the initial pattern, the 
size of a typical perturbation cell, and the strength of the perturbation all go to zero at the 
same rate as n goes to infinity. 



VI. DISCUSSION 

The objective of this final section is to extract the important ideas from the symbolic 
dynamics and to use them to develop a simple, heuristic picture of hypersensitivity to 
perturbation. Consider a classical system whose dynamics unfolds on a 2F-dimensional 
phase space, and suppose that the system is perturbed by a stochastic Hamiltonian whose 
effect can be described as diffusion on phase space. 

Suppose first that the system is globally chaotic with KS entropy K. For such a system 
a phase-space density is stretched and folded by the chaotic dynamics, developing expo- 
nentially fine structure as the dynamics proceeds. A simple picture is that the phase-space 
density stretches exponentially in half the phase-space dimensions and contracts exponen- 
tially in the other half of the dimensions. 

The perturbation is characterized by a perturbation strength and by correlation cells. We 
can take the perturbation strength to be the typical distance (e.g., Euclidean distance with 
respect to some fixed set of canonical coordinates) that a phase-space point diffuses under 
the perturbation during an e-folding time, F/K\n2, in a typical contracting dimension. 
The perturbation becomes effective, in the sense described in Sec. [V], when the phase-space 
density has roughly the same size in the contracting dimensions as the perturbation strength. 
Once the perturbation becomes effective, the effects of the diffusive perturbation and of the 
further contraction roughly balance one another, leaving the average phase-space density 
with a constant size in the contracting dimensions. 

The correlation cells are phase-space cells over which the effects of the perturbation are 
well correlated and between which the effects of the perturbation are essentially uncorrelated. 
We assume that all the correlation cells have approximately the same phase-space volume. 
We can get a rough idea of the effect of the perturbation by regarding the correlation cells 
as receiving independent perturbations. Moreover, the diffusive effects of the perturbation 
during an e-folding time F/K ln2 are compressed exponentially during the next such e- 
folding time; this means that once the perturbation becomes effective, the main effects 
of the perturbation at a particular time are due to the diffusion during the immediately 
preceding e-folding time. 

Since a chaotic system cannot be forever shielded from the effects of the perturbation, we 
can choose the initial time t = to be the time at which the perturbation is just becoming 
effective. We suppose that at t — the unperturbed density is spread over 2~ Kt ° correlation 



20 



cells, to being the time when the unperturbed density occupies a single correlation cell. The 
essence of the KS entropy is that for large times t the unperturbed density spreads over 



K(t) ~ 2 A ' (t - t(,) (6.1) 

correlation cells, in each of which it occupies roughly the same phase-space volume. The 
exponential increase of TZ(t) continues until the unperturbed density is spread over essentially 
all the correlation cells. We can regard the unperturbed density as being made up of sub- 
densities, one in each occupied correlation cell and all having roughly the same phase-space 
volume. 

After t — 0, when the perturbation becomes effective, the average density continues to 
spread exponentially in the expanding dimensions. This spreading is not balanced, however, 
by contraction in the other dimensions, so the phase-space volume occupied by the average 
density grows as 2 , leading to an entropy increase 

AH S ~ \og 2 (2 Kt ) = Kt . (6.2) 

Just as the unperturbed density can be broken up into sub- densities, so the average density 
can be broken up into average sub- densities, one in each occupied correlation cell. Each 
average sub-density occupies a phase-space volume that is 2 Kt times as big as the volume 
occupied by an unperturbed sub-density. 

The unperturbed density is embedded within the phase-space volume occupied by the 
average density and itself occupies a volume that is smaller by a factor of 2~ Kt . We can 
picture a perturbed density crudely by imagining that in each occupied correlation cell the 
unperturbed sub-density is moved rigidly to some new position within the volume occupied 
by the average sub-density; the result is a perturbed sub-density. A perturbed density is 
made up of perturbed sub-densities, one in each occupied correlation cell. All of the possible 
perturbed densities are produced by the perturbation with roughly the same probability. 

Suppose now that we wish to hold the entropy increase to a tolerable amount AH to \. We 
must first describe what it means to specify the phase-space density at a level of resolution 
set by a tolerable entropy increase Aif to i- An approximate description can be obtained in 
the following way. Take an occupied correlation cell, and divide the volume occupied by the 
average sub-density into 2 AHs ~ AHto1 non-overlapping volumes, all of the same size. Aggregate 
all the perturbed sub-densities that lie predominantly within a particular one of these non- 
overlapping volumes to produce a coarse-grained sub-density. There are 2 AHs ~ AHt ° l coarse- 
grained sub-densities within each occupied correlation cell, each having a phase-space volume 
that is bigger than the volume occupied by a perturbed sub-density by a factor of 

r)Kt 

2 AH ^ . (6.3) 



2AH s -AH tol 

A coarse-grained density is made up by choosing a coarse-grained sub-density in each occu- 
pied correlation cell. A coarse-grained density occupies a phase-space volume that is bigger 
than the volume occupied by the unperturbed density by the factor 2 AHto1 of Eq. ( |6.3|) and 
hence represents an entropy increase 

log 2 (2 A ^>) = AH tol . (6.4) 
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Thus to specify the phase-space density at a level of resolution set by AH to \ means roughly 
to specify a coarse-grained density. The further entropy increase on averaging over the 
perturbation is given by 

log 2 (2 A ^- AH -') = AH S - AH tol . (6.5) 

What about the information A/ min required to hold the entropy increase to AH to \l 
Since there are 2 AHs ~ AHto1 coarse-grained sub-densities in an occupied correlation cell, each 
produced with roughly the same probability by the perturbation, it takes approximately 
AH S — AH to \ bits to specify a particular coarse-grained sub-density. To describe a coarse- 
grained density, one must specify a coarse-grained sub-density in each of the 1Z(t) occupied 
correlation cells. Thus the information required to specify a coarse-grained density — and, 
hence, the information required to hold the entropy increase to AH to \ — is given by 

A/ min ~ K{t){AH s - AH tol ) (6.6) 



[cf. Eq. ( 5.35 )1, corresponding to there being a total of (2 AHs A - ff *oi) 7? -w coarse-grained densi- 



ties. The entropy increase ( |6.5| ) comes from counting the number of non-overlapping coarse- 
grained densities that are required to fill the volume occupied by the average density, that 
number being 2 AHs ~ AHto1 . In contrast, the information A/ m i n comes from counting the 
exponentially greater number of ways of forming overlapping coarse-grained densities by 
choosing one of the 2 AHs ~ AHt ° l non-overlapping coarse-grained sub-densities in each of the 
TZ(t) correlation cells. 

The picture developed in this section, summarized neatly in Eq. ( |b.6p , requires that AH to \ 
be big enough that a coarse-grained sub-density is much larger than a perturbed sub-density, 
so that we can talk meaningfully about the perturbed sub-densities that lie predominantly 
within a coarse-grained sub-density. If AH to \ becomes too small, Eq. ( |6.6P breaks down, 
and the information A/ min , rather than reflecting a property of the chaotic dynamics as in 
Eq. flS.fip, becomes essentially a property of the perturbation, reflecting a counting of the 
number of possible realizations of the perturbation. 

The boundary between the two kinds of behavior of A/ m i n is set roughly by the number 
F of contracting phase-space dimensions. When AH to \/F > 1, the characteristic scale of a 
coarse-grained sub-density in the contracting dimensions is a factor of 

( 2 Ai/ tol) l/F = 2 AH tol /F > 2 ( g ?) 

larger than the characteristic size of a perturbed sub-density in the contracting dimensions. 
In this regime the picture developed in this section is at least approximately valid, because 
a coarse-grained sub-density can accommodate several perturbed sub-densities in each con- 
tracting dimension. The information A/ min becomes a property of the system dynamics, 
rather than a property of the perturbation, because it quantifies the effects of the perturba- 
tion on scales as big as or bigger than the finest scale set by the system dynamics. 

In contrast, when AH to \/F < 1, we are required to keep track of the phase-space density 
on a very fine scale in the contracting dimensions, a scale smaller than the characteristic size 
of a perturbed sub-density in the contracting dimensions. Sub-densities are considered to 
be distinct, even though they overlap substantially, provided that they differ by more than 
this very fine scale in the contracting dimensions. The information A/ min is the logarithm of 
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the number of realizations of the perturbation which differ by more than this very fine scale 
in at least one correlation cell. The information becomes a property of the perturbation 
because it reports on the effects of the perturbation on scales finer than the finest scale set 
by the system dynamics — i.e., scales that are essentially irrelevant to the system dynamics. 

We are now prepared to put in final form the exponential hypersensitivity to perturbation 
of systems with a positive KS entropy: 

A/mi " n(t) ~ 2^-^ for Ai7 to i > F. (6.8) 



AH S - AH, 



tol 

Once the chaotic dynamics renders the perturbation effective, this exponential hypersensitiv- 
ity to perturbation is essentially independent of the form and strength of the perturbation. 
Its essence is that within each correlation cell there is a roughly even trade-off between 
entropy reduction and information, but for the entire phase-space density the trade-off is 
exponentially unfavorable because the density occupies an exponentially increasing number 
of correlation cells, in each of which it is perturbed independently. 

As noted above, the behavior of A/ m i n for AH to \ < F deviates from the universal behavior 
of Eq. ( p.8|) and tells one about the number of realizations of the perturbation that produce 
densities that differ on scales finer than the finest scale set by the system dynamics. For a 
diffusive perturbation of the sort contemplated in this section, A/ min diverges as AH to \ goes 
to zero, because a diffusive perturbation has an infinite number of realizations on even the 
tiniest scale. If the diffusive perturbation is replaced by a similar perturbation, but with 
a finite number of realizations, then the growth of A/ min is capped at the logarithm of the 
number of realizations, corresponding to the finest scale on which the perturbation acts. The 
perturbation used in the symbolic-dynamics analysis of perturbed chaotic maps in Sec. [V| 
is of this latter sort, with a finite number of realizations, the number being D 11 " = (2 nh y R - n . 
Indeed, the major simplifying assumption about the perturbation in Sec. [V| is that the sub- 
patterns produced by the perturbation are all different on the finest scale set by the system 
dynamics; i.e., there are no overlapping perturbed sub-patterns. This means that the cap 
on A/ min , which occurs at A/ min ~ log 2 (-D^") = 1Z n nh [cf. Eq. ( |5.16|) 1, is such that the 
universal behavior of Eq. ( |5.35| ) extends right down to AH to \ ~ 0. 

What about systems with regular, or integrable dynamics? Though we expect no uni- 
versal behavior for regular systems, we can get an idea of the possibilities from the heuristic 
description developed in this section. Hypersensitivity to perturbation requires, first, that 
the phase-space density develop structure on the scale of the strength of the perturbation, 
so that the perturbation becomes effective, and, second, that after the perturbation becomes 
effective, the phase-space density spread over many correlation cells. 

For many regular systems there will be no hypersensitivity simply because the phase- 
space density does not develop fine enough structure. Regular dynamics can give rise to 
nonlinear shearing, however, in which case the density can develop structure on the scale of 
the strength of the perturbation and can spread over many correlation cells. In this situation, 
one expects the picture developed in this section to apply at least approximately: to hold 
the entropy increase to AH to \ requires giving AH S — AH to \ bits per occupied correlation 
cell; A/ min is related to AH to \ by Eq. (|6.6|), with lZ{t) being the number of correlation cells 
occupied at time t. Thus regular systems can display hypersensitivity to perturbation if lZ(t) 
becomes large (although this behavior could be eliminated by choosing correlation cells that 
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are aligned with the nonlinear shearing produced by the system dynamics), but they cannot 
display exponential hypersensitivity to perturbation because the growth of TZ(t) is slower 
than exponential. 

A more direct way of stating this conclusion is to reiterate what we have shown in 
this paper: Exponential hypersensitivity to perturbation is equivalent to the spreading of 
phase-space densities over an exponentially increasing number of phase-space cells; such 
exponential spreading holds for chaotic, but not for regular systems and is quantified by a 
positive value of the Kolmogorov-Sinai entropy. 
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